\htmlhr
\chapterAndLabel{Tainting Checker}{tainting-checker}

The Tainting Checker prevents certain kinds of trust errors.
A \emph{tainted}, or untrusted, value is one that comes from an arbitrary,
possibly malicious source, such as user input or unvalidated data.
In certain parts of your application, using a tainted value can compromise
the application's integrity, causing it to crash, corrupt data, leak
private data, etc.

% Ought to have many more examples

For example, a user-supplied pointer, handle, or map key should be
validated before being dereferenced.
As another example, a user-supplied string should not be concatenated into a
SQL query, lest the program be subject to a
\href{https://en.wikipedia.org/wiki/Sql_injection}{SQL injection} attack.
A location in your program where malicious data could do damage is
called a \emph{sensitive sink}.

A program must ``sanitize'' or ``untaint'' an untrusted value before using
it at a sensitive sink.  There are two general ways to untaint a value:
by checking
that it is innocuous/legal (e.g., it contains no characters that can be
interpreted as SQL commands when pasted into a string context), or by
transforming the value to be legal (e.g., quoting all the characters that
can be interpreted as SQL commands).  A correct program must use one of
these two techniques so that tainted values never flow to a sensitive sink.
The Tainting Checker ensures that your program does so.

If the Tainting Checker issues no warning for a given program, then no
tainted value ever flows to a sensitive sink.  However, your program is not
necessarily free from all trust errors.  As a simple example, you might
have forgotten to annotate a sensitive sink as requiring an untainted type,
or you might have forgotten to annotate untrusted data as having a tainted
type.

To run the Tainting Checker, supply the
\code{-processor TaintingChecker}
or
\code{-processor org.checkerframework.checker.tainting.TaintingChecker}
command-line option to javac.
%TODO: For examples, see Section~\ref{tainting-examples}.


\sectionAndLabel{Tainting annotations}{tainting-annotations}

% TODO: add both qualifiers explicitly, and then describe their relationship.

The Tainting type system uses the following annotations:
\begin{description}
\item[\refqualclass{checker/tainting/qual}{Untainted}]
  indicates
  a type that includes only untainted (trusted) values.
\item[\refqualclass{checker/tainting/qual}{Tainted}]
  indicates
  a type that may include tainted (untrusted) or untainted (trusted) values.
  \code{@Tainted} is a supertype of \code{@Untainted}.
  It is the default qualifier.
\item[\refqualclass{checker/tainting/qual}{PolyTainted}]
  indicates qualifier polymorphism.
  For a description of qualifier polymorphism, see
  Section~\ref{method-qualifier-polymorphism}.
\end{description}


\sectionAndLabel{Tips on writing \code{@Untainted} annotations}{writing-untainted}

Most programs are designed with a boundary that surrounds sensitive
computations, separating them from untrusted values.  Outside this
boundary, the program may manipulate malicious values, but no malicious
values ever pass the boundary to be operated upon by sensitive
computations.

In some programs, the area outside the boundary is very small:  values are
sanitized as soon as they are received from an external source.  In other
programs, the area inside the boundary is very small:  values are sanitized
only immediately before being used at a sensitive sink.  Either approach
can work, so long as every possibly-tainted value is sanitized before it
reaches a sensitive sink.

Once you determine the boundary, annotating your program is easy:  put
\code{@Tainted} outside the boundary, \code{@Untainted} inside, and
\code{@SuppressWarnings("tainting")} at the validation or
sanitization routines that are used at the boundary.
% (Or, the Tainting Checker may indicate to you that the boundary
% does not exist or has holes through which tainted values can pass.)

The Tainting Checker's standard default qualifier is \code{@Tainted} (see
Section~\ref{defaults} for overriding this default).  This is the safest
default, and the one that should be used for all code outside the boundary
(for example, code that reads user input).  You can set the default
qualifier to \code{@Untainted} in code that may contain sensitive sinks.

The Tainting Checker does not know the intended semantics of your program,
so it cannot warn you if you mis-annotate a sensitive sink as taking
\code{@Tainted} data, or if you mis-annotate external data as
\code{@Untainted}.  So long as you correctly annotate the sensitive sinks
and the places that untrusted data is read, the Tainting Checker will
ensure that all your other annotations are correct and that no undesired
information flows exist.

As an example, suppose that you wish to prevent SQL injection attacks.  You
would start by annotating the
\sunjavadoc{java.sql/java/sql/Statement.html}{Statement} class to indicate that the
\code{execute} operations may only operate on untainted queries
(Chapter~\ref{annotating-libraries} describes how to annotate external
libraries):

\begin{Verbatim}
  public boolean execute(@Untainted String sql) throws SQLException;
  public boolean executeUpdate(@Untainted String sql) throws SQLException;
\end{Verbatim}


\sectionAndLabel{\code{@Tainted} and \code{@Untainted} can be used for many purposes}{tainting-many-uses}

The \code{@Tainted} and \code{@Untainted} annotations have only minimal
built-in semantics.  In fact, the Tainting Checker provides only a small
amount of functionality beyond the Subtyping Checker
(Chapter~\ref{subtyping-checker}).  This lack of hard-coded behavior has
two consequences.  The first consequence is that
the annotations can serve many different purposes, such as:

\begin{itemize}
\item
  Prevent SQL injection attacks:  \code{@Tainted} is external input,
  \code{@Untainted} has been checked for SQL syntax.
\item
  Prevent cross-site scripting attacks:  \code{@Tainted} is external input,
  \code{@Untainted} has been checked for JavaScript syntax.
\item
  Prevent information leakage:  \code{@Tainted} is secret data,
  \code{@Untainted} may be displayed to a user.
\end{itemize}

The second consequence is that the Tainting Checker is not useful unless
you annotate the appropriate sources, sinks, and untainting/sanitization
routines.
% This is similar to the \code{@Encrypted} annotation
% (Section~\ref{encrypted-example}), where the cryptographic functions are
% beyond the reasoning abilities of the type system.  In each case, the type
% system verifies most of your code, and the \code{@SuppressWarnings}
% annotations indicate the few places where human attention is needed.


If you want more specialized semantics, or you want to annotate multiple
types of tainting (for example, HTML and SQL) in a single program,
then you can copy the definition of
the Tainting Checker to create a new annotation and checker with a more
specific name and semantics.  You will change the copy to rename the
annotations, and you will annotate libraries and/or your code to identify
sources, sinks, and validation/sanitization routines.
See Chapter~\ref{creating-a-checker} for more
details.


\sectionAndLabel{A caution about polymorphism and side effects}{tainting-polymorphism-caution}

Misuse of polymorphism can lead to unsoundness with the Tainting Checker
and other similar information flow checkers. To understand the potential
problem, consider the \code{append} function in
\code{java.lang.StringBuffer}:

\begin{Verbatim}
  public StringBuffer append(StringBuffer this, String toAppend);
\end{Verbatim}

Given these declarations:

\begin{Verbatim}
  @Tainted StringBuffer tsb;
  @Tainted String ts;
  @Untainted StringBuffer usb;
  @Untainted String us;
\end{Verbatim}

\noindent
both of these invocations should be legal:

\begin{Verbatim}
  tsb.append(ts);
  usb.append(us);
\end{Verbatim}

That suggests that perhaps the function should be annotated as polymorphic:

\begin{Verbatim}
  // UNSOUND annotation -- do not do this!
  public @PolyTainted StringBuffer append(@PolyTainted StringBuffer this, @PolyTainted String toAppend);
\end{Verbatim}

The problem with the above annotation is that it permits the undesirable
invocation:

\begin{Verbatim}
  usb.append(ts); // illegal invocation
\end{Verbatim}

\noindent
This invocation is permitted because, in the expression, all
\<@PolyTainted> annotations on formal parameters are instantiated to
\<@Tainted>, the top annotation, and each argument is a subtype of the
corresponding formal parameter.

Beware this problem both in code you write, and also in annotated libraries
(such as stub files).  The correct way to annotate this class is to
add a class qualifier parameter; see Section~\ref{class-qualifier-polymorphism}.

(Side note:  if \code{append} were purely functional (had no side effects
and returned a new \<StringBuffer>) the method call would be acceptable,
because the return type is instantiated to \<@Tainted StringBuffer> for the
expression \<usb.append(ts)>.  However, the \code{append} method works via
side-effect, and only returns a reference to the buffer as a convenience
for writing ``fluent'' client code.)

% TODO: one could add a String[] value to @Untainted to distinguish different
% values, eg @Untainted{``SQL''} versus @Untainted{``HTML''}.

% LocalWords:  quals untaint PolyTainted mis untainting sanitization java
%%  LocalWords:  TaintingChecker untrusted unvalidated usb PolyDet
